Portfolio Check

The “done” and “coming soon” statuses are just to help me keep track of what I have put into my main portfolio rmd.


Module 01 Portfolio Content

  • Evidence worksheet_01
    • Completion status: Done
    • Comments:
  • Evidence worksheet_02
    • Completion status: Done
    • Comments:
  • Evidence worksheet_03
    • Completion status: Done
    • Comments:
  • Problem Set_01
    • Completion status: Done
    • Comments:
  • Problem Set_02
    • Completion status: Done
    • Comments:
  • Writing assessment_01
    • Completion status: Done
    • Comments:
  • Additional Readings
    • Completion status: Done
    • Comments

Data Science

  • Installation check
    • Completion status: Done
    • Comments:
  • Portfolio repo setup
    • Completion status: Done
    • Comments:
  • RMarkdown Pretty PDF Challenge
    • Completion status: Done
    • Comments:
  • ggplot
    • Completion status:
    • Comments:

Module 02 Portfolio Content

  • Evidence worksheet_04
    • Completion status: Done
    • Comments:
  • Problem Set_03
    • Completion status:
    • Comments:
  • Writing assessment_02
    • CANCELED
  • Additional Readings
    • Completion status: Done
    • Comments

Module 03 Portfolio Content

  • Evidence worksheet_05
    • Completion status: Done
    • Comments:
  • Problem set_04
    • Completion status:
    • Comments:
  • Writing Assessment_03
    • Completion status:
    • Comments:
  • Additional Readings
    • Completion status: Done
    • Comments

Project 1

  • CATME account setup and survey
    • Completion status: Done
    • Comments:
  • CATME interim group assessment
    • Completion status: Done
    • Comments:
  • Project 1
    • Report (80%): Done
    • Participation (20%): Done

Module 04 Portfolio Content

Project 2

  • CATME final group assessment
    • Completion status:
    • Comments:
  • Project 2
    • Report (80%): Done
    • Participation (20%): Done

Module 01 Part A

Data science Friday

Installation check

Github install

Github install

GitBash install

GitBash install

RStudio install

RStudio install

Portfolio repo setup

#created MICB425_portfolio directory on my computer
#created new repository 'MICB_portfolio' on my Github account
git init
git add .
git commit -m "comment text" #comment was 'First commit'
git remode add origin [repository url] #URL was taken from repository page on Github
git remote -v #just to check that URL was correct
git push -u origin master

RMarkdown pretty PDF challenge

R Markdown PDF Challenge

The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.

http://phdcomics.com/ Comic posted 1-17-2018

Challenge Goals

The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)

hint: go to the PhD Comics website to see if you can find the image above
If you can’t find that exact image, just find a comparable image from the PhD Comics website and include it in your markdown

Here’s a header!

Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).

Another header, now with maths

Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear! A calculator R is here:

1231521+12341556280987
## [1] 1.234156e+13

Table Time

Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in the future.

library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
I made this table with kable in the knitr package library
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh? Here’s ours! Include a fun gif of your choice!


Module 01 Part B

Origins and Earth Systems


Evidence worksheet 01

Whitman et al 1998

Learning objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General questions

What were the main questions being asked?
  • What amount of carbon mass and nutrients are containted in all the prokaryotes on Earth, and what is the total abundance of prokaryotes globally?
  • Which environments are significant contributers of prokaryotic biomass and which one is the most important?
  • How can all of these factors actually be measured?
  • How can we estimate the uncertainty of said measurements?
  • What effect does the global microbial population have on biogeochemistry?
What were the primary methodological approaches used?
  • Available numbers were taken from studies from each small environment. They were converted to the same format and units, then averaged for each environment. Each averaged measurement of an environment was extrapolated to be proportionate to the size of said environment. Then they had to do a fair amount of guesstimation, combined with comparisons to older values to see if their results were within a reasonable order of magnitude. They also calculated the results of sections by different methods to see if all gave the same order of magnitude.
Summarize the main results or findings.
  • Nearly all prokaryotic life comes from three main environments: aquatic environments, the subsurface, and soil. About half of the global organic C is contained in microbial cells and 90% of global nutrients. This is 10x the nutrients contained in plants globally. They also determined the turnover rate of cells in each environment.
Do new questions arise from the results
  • How likely is their calculated turnver time of thousands of years in the subsurface to be accurate? (Dr Hallam says pretty likely)
  • Where does energy for life in the subsurface come from? (Dr Hallam says there isn’t much in total, but what little there is comes from metal reduction)
  • Would the averages for each environment change if more measurements were taken in certain places?
  • Is the biogeochemical effect in the subsurface large, even if the turnover rate, and therefore microbial activity, is so slow?
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
  • I think there were a lot of necessary assumptions made to reach their conclusions, but the reason for each assumption was made clear so I was able to follow the logic of how they reached their conclusions pretty well.

Evidence worksheet 02

*** ###Evidence worksheet 03 Rockstrom 2009

Learning Objectives

  • Evaluate human impacts on the ecology and biogeochemistry of Earth systems

General questions

What were the main questions being asked
  • Which Earth-systems have the potential to cause unnaceptable environmental change if pushed too far?
  • What variables describe each of these systems?
  • What are the “threshold” limits of these variables, where going past would result in too much change to their representative system?
  • Which systems are already pushed past their limts?
  • How are the values of the limits for the systems past their limits determines?
  • Is each individual system stand alone, or interconnected with limits set on the others?
What were the primary methodological approaches used?
  • They picked 9 systems that industial human activity has influenced and that they believed had the potential to cause a great deal of harm to global environments. For each system a variable or two measuring change to the system was chosen. Finally a combination of best guesses and comparisons to pre-industrial values was used to find a safe limit for each variable.
Summarize the main results or findings.
  • The nine systems chosen:
  • climate change
  • rate of biodiversity loss (terrestrial and marine)
  • interference with the nitrogen and phosphorus cycles
  • stratospheric ozone depletion
  • ocean acidification
  • global freshwater use
  • change in land use
  • chemical pollution
  • atmospheric aerosol loading

  • Systems currentl past limit values:

System Variable Limit Value Current Value
Climate Change CO2 350ppm 418ppm (387ppm at time of article)
Climate Change Radiative Forcing 1Wm-2 1.5W-2
Biodiversity Loss Species Loss Rate 10x Background Rate 100-1000x Background Rate
Nitrogen Cycle N2 converted to NO3 or NH4 35x106 ton/yr 120x106 ton/yr
  • They also concluded that all of these systems are so large that they must have effects on each other, so each limit cannot be taken alone in a vacuum.
Do new questions arise from the results
  • How accurate are the limits and are they useful if not very accurate?
  • Are the systems past their limits beyond hope for restoration?
  • Where does the balance of human industry for the purpose of survival and the possible catastrophic environmental damage lay? As in, is there a point where stopping environmental damage from reactive nitrogen species runoff save more lives than the food production afforded by modern, fertilizer intensive agriculture saves?
Were there any specific challenges or advantages in understanding the paper?

I thought it was quite straight forward. It was pretty haunting to see their quoted atmospheric CO2 concentration and think “Did they get that wrong?” just to realize this was written 9 years ago and we have already pushed another 30ppm past the limit propposed here.


Problem set 01

Whitman et al 1998

Learning objectives:

  • Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific questions:

What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text
  • Aquatic - The majority of prokaryotic life is found in the open ocean. They have a short turnover time and therefore a high cellular productivity, which means that mutations and other rare genetic events are most likely to occur here than other habitats.

  • Subsurface - Major habitat for prokaryotes, with most of the subsurface biomass supported by organic matter deposited from the surface.

  • Soil - Major reservoir of organic carbon; prokaryotes are essential in soil decomposition

Environment Aquatic Subsurface Soil
Total abundance \(1.18*10^{29}\) \(3.8 x 10^{30}\) \(2.556*10^{29}\)
What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth?
  • Upper 200m of the ocean: \(3.6*10^{28}\)
  • Density: \(5*10^5\) cells/mL

  • Fraction represented by cyanobacterium including Prochlorococcus: 8%
  • Cyanobacteria: \((4*10^4 cells/ml)/(5*10^5 cells) * 100 = 8%\)

  • Cyanobacterium such as Prochlorococcus produce their own energy from sunlight via photosynthesis, which in the process produces oxygen while fixing carbon. Despite only being 8% of the prokaryotic cell abundance in the upper 200m, they are responsible for approximately 50% of the oxygen in the atmosphere and contribute greatly to carbon cycling as demonstrated by their quick turnover time, resulting in \(8.2 * 10^{29}\) cells/year.

What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?
  • Autotrophs - bacteria that produce their own food, primarily using energy from the sun. In this paper only marine autotrophs are considered, and the overwelming majority of them is said to be Prochlorococcus.

  • Heterotrophs - use organic carbon as an energy source and carbon source. They are the overwhelming majority of cells on Earth.

  • Lithotrophs - prokaryotes that gain energy from something other than organic carbon or sunlight. They are said to be found in small amounts in the subsurface and that organic carbom still sustains most life in the subsurface.

Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?
  • The Mariana Trench is the deepest part of the ocean at 10.99km depth. According to this paper, temperature becomses limitting around 4km deep in the ground. Depending on how that temperature/depth dependance scales with bottom depth in the ocean, I would expect the deepest life to be somewhere in the sediment below the bottom of Mariana Trench. Temperature becomes the limitting factor for life around 125C.
Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?
  • Mount Everest is the highest point connected to Earth at 8.8km. There is likely prokaryotic life a few meters in the ground near the summit where temperature is stable. This paper mentions cells found as high in the atmosphere as 77km. The ISS definitely has bacteria living inside of the astronauts onboard, if not on surfaces of the craft and that orbits at 408km above sea level. Voyager 1 is currently the furthest man made object above sea level at 21 000 000 000km from Earth. It was sterilized as it was build in 1977, but if any contamination occurred, it could carry the highest living organisms. At altitude carbon becomes a limitting factor if the cells are not photo or litho trophic. UV radiation in the upper atmoshpere or outer space could damage cells.
Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?
  • I agree witht he paper that the atmosphere should be included in the definition of the biosphere so: 77km above sea level + 11km below sea level = 88km thick biosphere. The densest band would be between -4km and +4km where most of the surface and subsurface life is.
How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)
  • Cells/year = Population Size * (turnover/year)

  • For marine autotrophs:
  • \(2.9*10^{27}cells * (365(days/year)/1.5days) = 7.1*10^{29}cells/year\)

What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?
  • Carbon content along with carbon assimilation efficiency determine the upper limit on the turnover rate. This varies with depth in the ocean, and between terrestrial and marine habitats because the abundance of carbon in each habitat is different. The authors used a calculated turnover rate in the top 200m of the ocean from the amount of net productivity in the ocean and compared it to direct measurements of generation time and turnover rate in the surface ocean. Because these values agreed they calculated turnover time for other environments the same way.
How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)

Example for heterotrophic cells in the top 200m of the ocean:

  • 4x10-7 mutations/generation
  • For 4 mutations to happen at once:
  • (4x10-7)4^ = 2.56x10-26 mutations/generation
  • (3.1x 1028 cells) x 22.8 = 8.2 X 1029 cells/yr
  • 365/16 = 22.8 turnover/yr
  • (8.2 x 1029 cells/ yr) x 2.56 x 1026 mutations/yr = 2.1 x 104 mutations/yr
  • 1/2.1x104 mutations/year x 365day/year x 24h/year = 0.4 hours per 4 simulataneous mutations
Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?
  • A large mutation rate means that there is a great potential for multiple point mutations in a single replication. This allows for quick adaptation by creating a more diverse pool of mutants to be selected from. Genetic diversity will be extremely high when small scale changes to sequence are considered and long term “species” level biodiversity will mostly be determined by competition and environmental pressures. Horizontal gene transfer, plasmid loss, and gene loss can be a significant source of diversity outside of point mutations. HGT likely possesses the highest adaptation potential for a community.
What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?
  • High abundance allows for high diversity by increasing the potential for mutations and simultaneous mutations. Metabolic potential is dependent on both abundance and diversity. Diversity determines the pool of available genes to be used in metabolic pathways and abundance determines the magnitude of the effect of these pathways. Metabolic potential also influences the community’s mutational potential because it lowers turnover time.

Problem set 02

Falkowski et al. 2008

Learning goals

  • Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

General questions

What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?
  • Tectonic movement along with photochemical reactions in the atmosphere allow for mixing and partitioning of chemical substrates on Earth.

  • Geophysical (abiotic): Tectonics (Acid/base), transfer of protons without electrons, atmospheric photochemical processes (photochemistry/radical chemistry = ‘magic’)
  • Biogeochemical(biotic): (Redox)

Why is Earth’s redox state considered an emergent property?
  • The Earth’s redox state is considered an emergent property because it depends on both geochemical processes and microbial metabolic processes.
How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?
  • Reversible electron transfer reactions give rise to element and nutrient cycles because of the thermodynamic conditions that make each reaction favorable. Specifically, the rate at which each reaction occurs is determined by the conditions of the environment (i.e. abundance of substrates, products, et. cetera) thus allowing for nutrients to cycle in a stable manner.
Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?
  • The different stages of the nitrogen cycle all require different amounts of oxygen, corresponding to different redox niches, and the other microbes that will be in that environment. In nitrogen fixation, nitrogen gas is fixed from our atmosphere into ammonium. This is done by nitrogenase, which is inhibited by oxygen, yet this is still done in many microbes in aerobic environments. Nitrification, the oxidation of ammonium to nitrite and finally nitrate, happens in aerobic environments. Denitrification, the anaerobic oxidation of nitrate and nitrite to nitrogen gas, occurs in anoxic/anaerobic environments. Indirectly, the nitrogen cycle is connected to climate change. All microbes require nitrogen to synthesize protein and nucleic acids, and the only method of nitrogen fixation is via microorganisms. The nitrogen cycle is what controls the amount of available fixed nitrogen, which in turn affects the number of microbes carrying out various other reactions, which in turn produces the Earth’s atmosphere. Additionally, if something were to shift in the nitrogen cycle and affect those microbes involved, it would in turn affect the overall microbial community. The nitrogen cycle does not directly affect climate change, as only nitrogen and hydrogen gas are emitted during nitrogen fixation, nitrification, or denitrification, neither of which are greenhouse gases.
What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?
  • Although there is enormous genetic diversity in nature, there remains a relatively stable set of core genes coding for the major redox reactions essential for life and biogeochemical cycles. Thus, microbial diversity does not necessarily entail diversity in proteins involved in metabolism.

  • It is hypothesized that there is limitless evolutionary diversity in nature. The rate of discovery of unique protein families has been proportional to the sampling effort, with the number of new protein families increasing approximately linearly with the number of new genomes sequenced.

On what basis do the authors consider microbes the guardians of metabolism?
  • Temporary and simultaneous guardianship because of the nature of microbial evolution from horizontal and vertical gene transfer, which can change which phenotype is dominant at the time. A dominant phenotype protects the metabolic pathway in the environment, and if it does not survive environmental perturbations applying selective pressures on pathway genes, then it will disappear. Humans could possibly replicate the individual pathways, but the overall metabolic biogeochemical processes that control the flow of electrons can only be done by microbes.

Writing Assignment 1

Prompt

“Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.” Do you agree or disagree with this statement? Answer the question using specific reference to your reading, discussions and content from evidence worksheets and problem sets.”

Response

Microbial actions and human life are indisputably intertwined. Humans, however, are not wholly dependent the ecological catalysis of microbial life for the survival and proliferation of our species. Prokaryotic life catalyzes several important biogeochemical cycles globally, providing immense shifts in the redox states of nearly all biologically important compounds via their metabolic processes. Currently, both human activities and microbial metabolisms contribute to nutrient cycling on Earth, with microbes bearing most of the weight, but this balance is changing at an accelerating rate.

Organisms with consciousnesses have the ability to apply their efforts directionally towards a specific change. As humans, we can make plans while taking future conditions into account, we can cooperate towards a goal, and we can place a value on the success of individuals and species beyond the self. Microbes are bound by the forces of natural selection, and changes must be beneficial to an individual cell in order to be passed on to the next generation. Human consciousness gives humanity the capacity to make faster and more drastic changes to the biogeochemical landscape than microbial life.

Over the century since the industrial revolution, human industry, technology, and understanding of the universe has increased exponentially. Compared to the temporal scale that geological processes and shifts in net global microbial function occur, all of human progress is only a tiny blip on the tail end of history. Although human impact on biogeochemical cycles is minimal compared to that of prokaryotes in the present day, the pattern of exponential increase in humanity’s capacity to alter the environment, combined with recent emerging biotechnologies, makes for a compelling argument that microbes will become replaceable in the not so distant future.

The potential for human processes to excel beyond microbial processes is facilitated by consciousness. This emergent property gives humanity three main abilities that mitigate the normal evolutionary selection processes: the ability to cooperate towards a goal that may not immediately be beneficial to all contributors; the ability to take future impacts of actions into account when making current plans; and to place an innate value on the lives of humans and other species, both through compassion and through economic market forces. Together, this means that people can establish how a system functions, allocate resources towards a change in said system, and enact a plan not only to change the system in a specific direction, but to change it in what is determined to be the best direction. Conversely, changes to net prokaryotic population function, another emergent property, are determined solely by whether a small change is beneficial to the ability of an individual to reproduce. Natural selection places limits on the capacity of microbes to enact change on biogeochemical cycles compared to humans.

Both the maximum potential for change to biogeochemical cycles and the rate at which this potential can increase for microbes have boundary limits. The rate that the global, net prokaryotic metabolism can change is limited by the rate at which cells divide and the rate at which the global microbial genetic pool can be altered. Although the estimated number of prokaryotic cells on Earth is astronomically large, on the order of 1030, nearly all of them live in the terrestrial subsurface and have an average turnover rate on the scale of centuries (Whitman et al., 1998). Within geological time scales, this ‘silent majority’ is extremely active and relevant, but on the time scale of modern human environmental intervention, the division rate of these hidden cells makes their tremendous abundance much less consequential.

Cell division is also limited by energy availability. The primary input of energy to global biological systems is photosynthetic carbon fixation by higher plants and photosynthetic bacteria. The rate at which sunlight is transformed to an ecologically available energy source by a given photosynthetic population and the rate at which this energy can be disseminated to other organisms in the deep ocean and terrestrial sediment both place boundaries on the maximum global rate of microbial production. Global metabolic catalysis is dependent on the energy supplied to biological processes, and microbes have physical limitations to both maximum energy production and energy transfer between cells.

Beyond cellular division, genetic variation is necessary for changes to prokaryotic metabolic function, which in turn determines the ability for the global prokaryotic population to alter biogeochemical cycles. Horizontal gene transfer and the extremely high abundance of cells on Earth make useful mutations extremely common, even on small temporal scales. It is estimated that four simultaneous mutations occur in a cell every half hour, in the surface ocean alone (Whitman et al., 1998), however, this does not mean that the pool of available genes changes quickly. For a mutation to be heritable, it cannot be lethal. This presents a hard limit on the extent of change that can happen to genetic sequences in a single generation. Proteins vital to the survival of an individual cell, such as the metabolic enzymes relevant to many biogeochemical cycles, cannot be completely changed by mutation to a single cell in a single generation. Instead, functional diversity is the cumulative change to sequences over long time periods. Natural selection in a varied pool of random mutations is a system that strongly favours improvement to existing structures over the introduction of truly novel ones. The core set of proteins that carry out metabolic redox reactions which drive global biogeochemical cycles were developed extremely early in the history of life on Earth, and are still highly conserved (Falkowski et al. 2008). Microbial populations have boundary conditions that limit metabolic rate and functional change, imposed by both the processes of mutation and selection, and the rate of energy acquisition and distribution within a biological system. In the context of human activity, these boundaries have different limits, and may be able to be completely mitigated in the near future.

Recent history provides evidence of the potential for human activity to be the dominant controller of global nutrient cycles. Humanity has been raising the limits of energy acquisition and distribution since the first use of controlled fire, nearly 600 000 years ago (Berna et al., 2012). The ability to obtain and use energy more efficiently has increased along with the development of human civilizations. The first agriculture marked the beginning of a steady march toward increasingly efficient conversion of sunlight to available food sources, and made the first large-scale energy distribution network necessary: the transport and trade of food. The beginning of the industrial revolution marked the shift away from human bodies as the primary means of energy conversion from chemical to other forms. Vast canal systems for coal distribution made up the second, higher throughput energy distribution system. Finally, the discovery of electricity, along with the wide scale adoption of oil as a fuel source, ushered in the third generation of power production and distribution. In modern times, vast amounts of energy are produced by a ‘metabolism’ of human activity. Electrical and chemical energy are distributed along global networks of wires, pipes, and roads.

Energy availability becomes less of a limitation to maximum human impact on biogeochemical cycles every year. At the present date, human industry already rivals the magnitude of influence on nitrogen and carbon cycling by microbial metabolisms. Atmospheric carbon dioxide measurements show that the interannual increase in carbon dioxide due to anthropogenic combustion of fossil fuels, indicating that humanity already has to power to be the deciding factor in carbon cycling but does has not yet implemented directional control (NOAA, 2018). Likewise, the Haber process has allowed human activity to synthetically reduce massive amounts of nitrogen gas to ammonium for use in agriculture. At the turn of the millenium, humans produced about half of all nitrogen fixed annually, and this value has been increasing exponentially since the 1940’s (Rockstrom, 2009;Vitousek et al., 1997). Besides the conversion of organic matter to carbon dioxide, and the conversion of nitrogen gas to ammonium, human industry has the capacity to upset nearly any step in global biogeochemical cycles, should the current microbial processes become insufficient.

Man-made fixed nitrogen and carbon dioxide have both increased exponentially as global energy production has risen. However, the limitation of energy availability will not last much longer. A crude exponential fit of global energy production from 1820 to 2010 extrapolated to the year 4000 shows that humanity will consume the energy of our entire sun in just another 1800 years if production continues on the trend set since the industrial revolution (Fig. 1). In all likelihood, an element beyond energy availability, such as the maximum carrying capacity for human life on Earth, will set a new limit on human progress long before the need for a Dyson sphere is reached, but the key factor is that human energy production is virtually endless compared to the limited photosynthetic rate providing energy to microbial metabolism.


Figure 1: Data from Vlachogianni and Valavanidis, 2013. Solid blue line; global energy consumption per year. Dotted blue line; extrapolated energy use based on exponential fit to post industrial energy production (R2=0.92). Horizontal red line; annual energy output of the sun. Vertical red line; model intercept with sun’s energy output, year 3875.

If the energy available to humans far outpaces that of microbes, the other factor at play is diversity of function and its rate of function. The human analog for the global microbial gene pool is the sum of human knowledge and available computational power. Computational power has increased exponentially since the first integrated circuits in the 1960’s. Transistor density has followed Moore’s Law by doubling every year, although this is expected to stop in the near future as transistor sizes become small enough for quantum effects to cause problems with keeping microcircuits closed (Chien and Karamcheti, 2013). Gallium and other alternatives to silicon are being explored to put off the end to Moore’s Law, but these are all just stopgaps and eventually transistor density must plateau due to physical limitations. However, computational power can still increase exponentially without an increase in transistor density, as long as there is enough available energy to fabricate and run more computer chips. Energy production is not likely to reach a maximum limit before Moore’s Law is terminated, meaning future computational power will be tied to energy production, a value that has been increasing exponentially for two centuries.

The human analogue to microbial genetic diversity is the diversity of technologies available. The sum of human knowledge has increased exponentially as energy availability and societal changes have allowed for greater resource allocation to research. Specifically, the science of microbiology is only in its infancy. Microbes were first observed a mere 400 years ago and medical microbiology exploded just 100 years ago. DNA was first imaged 70 years ago and molecular techniques in biology have become increasingly complex since then. Humanity’s collective understanding of how life functions, including how microbes impact geochemical cycles, has increased at an accelerating rate throughout all of human history. It is reasonable to expect that given increased energy and resources going forward in time, knowledge of biotic chemistry will continue to increase exponentially. All of human history is a raindrop in the ocean of time, where significant changes to genetic diversity and geologic equilibria have occurred. Furthermore, the period of time since people have begun to tease apart the intricacies of life on a microscopic scale is only a molecule of water in that raindrop. Right now, new human technologies can develop considerably faster than new microbial functions. Humanity is on the tipping point of making prokaryotes obsolete.

With the assumption that anthropologic energy production and knowledge of the universe will continue to increase exponentially into the next millennium, humanity is poised to make the metabolic catalysis of biogeochemical cycles by microbes unnecessary. Consciousness has allowed higher boundaries on rates of change to the environment for humans than for microbes, As evidenced by the current upsets to global carbon and nitrogen cycles. Potential for humans to alter biogeochemical processes will increase much faster than biological or geological systems will be able to adapt. The next millennium will mark the point where sufficient energy and technology will be available to humans to make prokaryotic processes antiquated and irrelevant.


Module 01 references

Whitman WB, Coleman DC, and Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci USA. 95(12):6578–6583. PMC33863

Rockstrom. 2009. A safe operating space for humanity. Nature. 461(24). DOI:10.1038/461472a

Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive Earth’s biogeochemical cycles. Science. 320(1034). DOI:10.1126/science.1153213

Berna et al. 2012. Microstratigraphic evidence of in situ fire in the Acheulean strata of Wonderwerk Cave, Norther Cape province, South Africa. PNAS. 109(20)E1215-E1220. DOI:10.1073/pnas.1117620109

NOAA. 2018. Recent Monthly Mean CO2 at Mauna Loa.

Vitousek et al. 1997. Human Domination of Earth’s Ecosystems. Science. 277(5325): 494-499. DOI: 10.1126/science.277.5325.494

Vlachogianni and Valavanidis. 2013. Energy and Environmental Impact on the Biosphere Energy Flow, Storage and Conversion in Human Civilization. Science and Education Publishing.

Chien and Karamcheti. 2013. Moore’s Law: The First Ending and a New Beginning. Computer. 46(12):48-53. DOI: 10.1109/MC.2013.431


Module 02

Remapping the body of the world


Evidence worksheet 04

Martinez et al. 2007

Learning Objectives

  • Discuss the relationship between microbial community structure and metabolic diversity
  • Evaluate common methods for studying the diversity of microbial communities
  • Recognize basic design elements in metagenomic workflows

General questions

What were the main questions being asked
  • Can light driven ATP synthesis be transferred to a heterologous bacterium in a single genetic event?
  • How does the proteorhodopsin photosystem function and what is its structure?
What were the primary methodological approaches used?
  • Screen for PR-containing clones on retinal-containing LB agar plating medium, by looking for red or orange pigmentation.
  • Sequence a collection of transposon-insertion clones to find fosmids that probably contain the PR photosystem
  • They also analyzed different transposon insertion mutants and looked at the accumulations of intermediates to find the functions of each gene with the insertion. This was done using cell pigmentation measurements via HPLC.
  • In another experiment they measured pH with the fosmid put in E. coli to determine whether the fosmids independently expressed a functional photosystem with light-activated proton-translocating activity
Summarize the main results or findings.
  • Two fosmids were identified that contained the genes that are necessary and sufficient for proteorhodopsin based phototrophy. These were cloned into E. coli cells and both exterior pH and interior ATP concentration were shown to change when the e. Coli cells were exposed to light. Further, they showed that these fosmids contained genes sufficient to produce retinol (PR cofactor) as long as the cells already produced the intermediate FPP, which e. coli and many other bacteria do. Copy number of the genes showed a difference in phenotypic identification. The clones also had high similarity to other PR-containing BAC clones from Alphaproteobacteria from the Mediterranean and Red Seas.
Do new questions arise from the results
  • How much variation in this grouping of genes is there naturally?
  • Are retinal pathways usually close to the PR gene? Are they transferred together as a rule?
  • Are these genes (retinol pathway+PR) usually located on a plasmid or integrated into the bacterial genome in natural communities
  • Would you expect to see this gene set distributed across a lot of phyla?
Were there any specific challenges or advantages in understanding the paper?
  • At first I was confused by how each set of experiments fit together, but on a second reading it meshed better in my head. I also didn’t know what a fosmid was and learning that helped a lot.
  • Fosmids are large single-copy plasmids (based on f plasmid origin of replication) capable of holding really large inserts (up to ~40 kb)
  • Fosmids normally have a single copy origin of replication but the ones used in this paper have a second, inducible origin of replication.
  • Mutant rep proteins causes a high copy number by continuously replicating fosmid
  • Rep proteins are under arabinose-inducible promoter, so under inducing conditions, mutant rep proteins causing constitutive fosmid replication are expressed

Problem set 03

Questions

How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?
How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?
What types of on-line resources are available for warehousing and/or analyzing anvironmental sequence information (provide names, URLS and applications)?
What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?
What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?
Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?

Module 02 references

Martinez A et al. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. PNAS. 104(13):5590-95. DOI: 10.1073/pnas.0611470104


Module 03

Microbial Species Concepts


Evidence worksheet 05 - Part 1

Welch et al. 2002

Learning Objectives

  • Evaluate the concept of microbial species based on environmental surveys and cultivation studies.

  • Explain the relationship between microdiversity, genomic diversity and metabolic potential

  • Comment on the forces mediating divergence and cohesion in natural microbial communities

General questions

What were the main questions being asked
  • How different are these 3 genomes beyond their 16S sequences?
  • What is a microbial species?
  • How similar can E. coli be to Salmonella?
  • How does selection due to an environment influence transfer of gene islands?
  • Should natural environment be used to define a species, or just to define an ecotype?
What were the primary methodological approaches used?
  • Random clones were sequenced with dye terminator chemistry (Automated Sanger) on 3700 machines then assembled with SEQMANII (older contig/scaffold software?) to create a complete genome for the CFT073 strain of E. coli with 7x coverage. Genes from this genome and two other preexisting genomes were annotated using an online system called MAGPIE, where many people could contribute. Genes were predicted based on ORF’s and searched on BLAST to automatically annotate any gene that had a match already.
Summarize the main results or findings.
  • There was a similar backbone (mostly mutated by VGT) in all 3 strains but the strains had many gene islands that were different in each strain. These islands were acquired by horizontal gene transfer and often contain specific genes necessary to survive in the strain’s niche environment. For example, many of the islands contain pathogenicity genes that are necessary to infect humans in specifically the urinary tract or the gastrointestinal tract. Some islands contained genes extremely simlar to other genera that infect the same part of a human, like Salmonella. CFT073 doesn’t have type III secretion system or phage/plasmid encoded virulence genes that are common in O157:H7. All three strains have a pilus gene similar to salmonella enterica but amino acid sequence identity is only 53-81%. This means that there are probably different selective pressures on different strains based on selection caused by their specific environment. Only 39% of sequence was shared between all 3 strains but they were considered the same species based ontheir 16S sequences.
Do new questions arise from the results
  • Do other microbial species have this large of a variance in gene presence between strains?
  • Is this variance more common in pathogenic strains?
  • Is 16S a reasonable way to classify microbial species, if 3 strains that match to the same species based on 16S, only share 39% of their genes?
  • Should environment be considered when defining a species?
  • Should strain or ecotype be considered a more important taxonomic classification level than just species?
  • Are certain gene islands always correlated with life in a specific environmental niche?
  • Can a quantitative species definition cutoff ever be realistic, if sequencing error and differences between strains and even individuals are considered?
Were there any specific challenges or advantages in understanding the paper?
  • The techniques for data processing were not very clear. Combined with a very brief introduction it made it difficult to follow the connections they were making between their new sequence data and outside sources. However, I do not think this is necessary to be able to understand the important parts of the paper. It isn’t really about sequencing a new genome, it is about the larger picture of how species definition and community structure can and should be described.

Evidence worksheet 05 - Part 2

Welch et al. 2002

Learning Objectives

  • Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
  • Identify common molecular signatures used to infer genomic identity and cohesion
  • Differentiate between mobile elements and different modes of gene transfer

Question

Based on your reading and discussion notes, explain the meaning and content of the following figure derived from the comparative genomic analysis of three E. coli genomes by Welch et al. Remember that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Explain how this study relates to your understanding of ecotype diversity. Provide a definition of ecotype in the context of the human body. Explain why certain subsets of genes in CFT073 provide adaptive traits under your ecological model and speculate on their mode of vertical descent or gene transfer.
  • The X axis is position along the backbone genome (the whole genome minus island insertion distances). Each vertical deviation is the size of a gene island at that location, and which strain it is present in is shown by direction. An ecotype is a specific version of a species of a microbe that is adapted to a certain niche habitat. In the context of the human body this could be life in the mouth vs life in the gut. The two strains of E. coli here, CFT073 and EDL933, have acquired and retained gene islands that help them survive in the urinary tract or intestinal tract, respectively. Some islands are in both strains. These likely contain genes that are useful just for pathogenic life in the human body, regardless of specific microenvironment. Islands that are only in one strain were likely acquired from other strains or species through horizontal gene transfer. Islands that are present in both strains could be the result of horizontal gene transfer between them, horizontal gene transfer with a third party by both of them, or inherited vertically from a common ancestor.

Problem Set 04

Candy community counting

Questions

How does the measure of diversity depend on the definition of species in your samples?

In order for a sample to be diverse, individuals must be divided into groups based on some sort of differentiation. Species definition is how these groups are chosen. Bothe the Simpson Index and Chao1 richness values change if the number of species changes. This means that if your species definition is more granular and has a higher taxonomic resolution, a sample will appear more diverse than the same sample analysed using a different definition of what a species is.

Can you think of alternative ways to cluster or bin your data that might change the observed number of species?

We did not draw any differences based on colour, deciding that different colours marked different strains within the same species. If we dividied species by colour as well as brand we would have many more species in the same community. Alternatively, we could have grouped all gummy candies together and all chocolate, or all round candies together to end up with less total species. Certain very specific changes could have also been used to raise our species count such as separating dark chocolate M&M’s from milk chocolate into two species.

How might different sequencing technologies influence observed diversity in a sample?

Sanger and Illumina sequencing both use PCR befor sequencing, whether for raising the template concentration or cluster generation. PCR has can introduce changes to the sequence when DNA taq polymerase mismatches a base, but these do not make it through to the final sequence because base calling in both systems is based off an integrated value of many molecules at once. However, if an incorrect sequence was generated in an early PCR cycle, it could continue to be replicated and evetually compose a sizeable portion of the final sequence pool being observed. This could cause incorrect sequences. Another problem that PCR introduces is chimera generation. Sequences can recombine part way through replication and create new, hybrid sequences containinng part of two other original sequences. This new sequence will be replicated and eventually sequenced. Once sequences are obtained and it is time to try to bin into species, PCR error will blur the edges of similar sequences, even if they should all be contained in one species. More importantly, chimeric sequeces will appear as completelydifferent organisms and drastically raise the amount of species in a sample and therefore the diversity of the sample. Third generation sequencing based on single molecules, such as Oxford Nanopore, eliminate the issues introduced by PCR because it is no longer a necessary part of the sequencing process.


Module 03 references

Welch et al. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. PNAS. 99(26). [DOI: 10.1073/pnas.252529799] (https://www.ncbi.nlm.nih.gov/pubmed/12471157)